ImageNet Training in 24 Minutes

نویسندگان

  • Yang You
  • Zhao Zhang
  • James Demmel
  • Kurt Keutzer
  • Cho-jui Hsieh
  • Cho-Jui Hsieh
چکیده

Finishing 90-epoch ImageNet-1k training with ResNet-50 on a NVIDIA M40 GPU takes 14 days. This training requires 10 single precision operations in total. On the other hand, the world’s current fastest supercomputer can finish 2× 10 single precision operations per second (Dongarra et al. 2017). If we can make full use of the supercomputer for DNN training, we should be able to finish the 90-epoch ResNet-50 training in five seconds. However, the current bottleneck for fast DNN training is in the algorithm level. Specifically, the current batch size (e.g. 512) is too small to make efficient use of many processors For large-scale DNN training, we focus on using large-batch data-parallelism synchronous SGD without losing accuracy in the fixed epochs. The LARS algorithm (You, Gitman, and Ginsburg 2017) enables us to scale the batch size to extremely large case (e.g. 32K). We finish the 100-epoch ImageNet training with AlexNet in 24 minutes, which is the world record. Same as Facebook’s result (Goyal et al. 2017), we finish the 90-epoch ImageNet training with ResNet-50 in one hour. However, our hardware budget is only 1.2 million USD, which is 3.4 times lower than Facebook’s 4.1 million USD.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs. This was made possible by using a large minibatch size of 32k. To maintain accuracy with this large minibatch size, we employed several techniques such as RMSprop warm-up, batch normalization without moving averages, and a slow-start learning rate schedule. This paper also d...

متن کامل

Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train

For the past 5 years, the ILSVRC competition and the ImageNet dataset have attracted a lot of interest from the Computer Vision community, allowing for state-of-the-art accuracy to grow tremendously. This should be credited to the use of deep artificial neural network designs. As these became more complex, the storage, bandwidth, and compute requirements increased. This means that with a non-di...

متن کامل

Efficient Training of Convolutional Neural Nets on Large Distributed Systems

Deep Neural Networks (DNNs) have achieved impressive accuracy in many application domains including image classification. Training of DNNs is an extremely computeintensive process and is solved using variants of the stochastic gradient descent (SGD) algorithm. A lot of recent research has focussed on improving the performance of DNN training. In this paper, we present optimization techniques to...

متن کامل

PowerAI DDL

As deep neural networks become more complex and input data-sets grow larger, it can take days or even weeks to train a deep neural network to the desired accuracy. Therefore, distributed Deep Learning at a massive scale is a critical capability, since it offers the potential to reduce the training time from weeks to hours. In this paper, we present a software-hardware co-optimized distributed D...

متن کامل

What makes ImageNet good for transfer learning?

The tremendous success of features learnt using the ImageNet classification task on a wide range of transfer tasks begs the question: what are the intrinsic properties of the ImageNet dataset that are critical for learning good, general-purpose features? This work provides an empirical investigation of various facets of this question: Is more pre-training data always better? How does feature qu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017